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Abstract 


Coronaviruses recently emerged as major human pathogens causing outbreaks 
of severe acute respiratory syndrome and Middle-East respiratory syndrome. 
They utilize the spike (S) glycoprotein anchored in the viral envelope to mediate 
host attachment and fusion of the viral and cellular membranes to initiate 
infection. The S protein is a major determinant of the zoonotic potential of 
coronaviruses and is also the main target of the host humoral immune response. 
We report here the 3.5 A resolution cryo-electron microscopy structure of the S 
glycoprotein trimer from the pathogenic porcine deltacoronavirus (PDCoV), which 
belongs to the recently identified delta genus. Structural and glycoproteomics 
data indicate that the glycans of PDCoV S are topologically conserved when 
compared with the human respiratory coronavirus HCoV-NL63 S, resulting in 
similar surface areas being shielded from neutralizing antibodies and implying 
that both viruses are under comparable immune pressure in their respective 
hosts. The structure further reveals a shortened S.’ activation loop, containing a 
reduced number of basic amino acids, which participates to rendering the spike 
largely protease-resistant. This property distinguishes PDCoV S from recently 
characterized betacoronavirus S proteins and suggests that the S protein of 
enterotropic PDCoV has evolved to tolerate the protease-rich environment of the 
small intestine and to fine-tune its fusion activation to avoid premature triggering 


and reduction of infectivity. 
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Importance 


Coronaviruses use transmembrane spike (S) glycoprotein trimers to promote host 
attachment and fusion of the viral and cellular membranes. We determined a near- 
atomic resolution cryo-electron microscopy structure of the S ectodomain trimer from 
the pathogenic porcine deltacoronavirus (PDCoV), which is responsible for diarrhea in 
piglets and has had devastating consequences for the swine industry worldwide. 
Structural and glycoproteomics data reveal that PDCoV S is decorated with 78 N-linked 
glycans obstructing the protein surface to limit accessibility to neutralizing antibodies in 
a way reminiscent of what has recently been described for a human respiratory 
coronavirus. PDCoV S is largely protease-resistant which distinguishes it from most 
other characterized coronavirus S_ glycoproteins and suggests that enteric 
coronaviruses have evolved to fine-tune fusion activation in the protease-rich 


environment of the small intestine of infected hosts. 
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Introduction 

Coronaviruses are large enveloped viruses, with single-stranded positive-sense RNA 
genomes, classified in four genera (a, 8, y, and 5) based on their sequence similarity. 
Most recognized coronaviruses are animal viruses but four coronaviruses, namely 
HCoV-229E, HCoV-OC43, HCoV-NL63 and HCoV-HKU1, are known to continuously 
circulate in the human population and are associated with up to 30% of respiratory tract 
infections(1). In addition, severe acute respiratory syndrome (SARS-CoV) and Middle- 
East respiratory syndrome (MERS-CoV) coronaviruses are zoonotic viruses causing 
deadly pneumonia in humans(2). SARS-CoV and MERS-CoV have resulted in more 
than 8,000 and 2,000 cases with fatality rates of 10 and 35%, respectively. No specific 
antiviral treatments or vaccines are approved for human coronaviruses and zoonosis 


remains a great pandemic threat. 


The ability to recognize the appropriate receptor and to efficiently enter host cells are 
key requirements for cross-species spillover of zoonotic viruses such as influenza(3). 
For coronaviruses, these two functions are carried out by the spike (S) glycoprotein. 
Therefore, structural and functional studies of S glycoproteins can provide invaluable 
information to evaluate the cross-species transmission potential of these viruses. The 
coronavirus S protein is a class | viral fusion protein that forms homotrimers decorating 
the viral envelope. It is composed of an N-terminal S; subunit, responsible for receptor- 
binding, and a C-terminal S. subunit, which contains the fusion machinery. The 


combined activities of the two subunits promote coronavirus attachment to host cells 
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and subsequent fusion of the viral and cellular membranes, via_ irreversible 
conformational changes, initiating viral infection. Since it is the major surface protein, S 
is also the main target of neutralizing antibodies during infection and a focus of vaccine 


design. 


The zoonotic potential of coronaviruses is determined by the receptor-binding properties 
of the S protein. For instance, SARS-CoV and MERS-CoV bind with high-affinity to their 
cognate human receptors, angiotensin-converting enzyme 2 (ACE2) and dipeptidyl 
peptidase 4 (DPP4), respectively(4, 5). Metagenomic data revealed that many MERS- 
CoV and SARS-CoV-like viruses exist in bats and one such virus, WIV-1, isolated from 
bat feces, shares 99.9% nucleotide sequence identity with SARS-CoV. The S protein 
encoded by WIV-1 binds human, bat and civet ACE2 orthologues allowing the virus to 
efficiently infect human cells expressing any of these three orthologues(6, 7). Similarly, 
HKU4-CoV and HKU5-CoV that are closely related to MERS-CoV have been identified 
in bats and HKU4-CoV can be adapted to bind human DPP4 by substituting three 


amino acids in the S receptor-binding domain(8, 9). 


The zoonotic potential of coronaviruses is further determined by fusion activation which 
requires S processing by host proteases. Up to two cleavage sites are present in S 
glycoproteins: a site found at the boundary between the S; and Se subunits of some 
coronavirus S (the S1/Se site) and a conserved site upstream from the fusion peptide 


(the Sp’ site)(10). 
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For a subset of coronaviruses, such as MHV, SARS-CoV and MERS-CoV, the S 
glycoprotein is cleaved at the S1/Sz junction during biogenesis and viral egress(10-13). 
This proteolytic event, along with subsequent binding to the host receptor, enhances 
processing at the S.’ site and participates in MERS-CoV or SARS-CoV fusion 
activation(11, 13). Moreover, substitution of two residues at the boundary between the 
S1 and Se subunits enables efficient processing by human proteases and allows the bat- 


infecting HKU4-CoV S protein to mediate entry into human cells(14). 


Proteolysis at the conserved S,’ site is essential for fusion activation of all characterized 
coronavirus S proteins and it can occur at the host membrane or in internal cellular 
compartments. For instance, transmembrane protease/serine protease (TMPRSS) 
processing of SARS-CoV and MERS-CoV S at the cell membrane, furin-mediated 
processing of HCoV-NL63 and MERS-CoV §S in the early endosomes, or endo- 
lysosomal protease-mediated triggering of SARS-CoV S (by cathepsin L) and MHV S 
are key events orchestrating spatial and temporal activation of fusion to ensure 
successful viral entry into host cells(12, 13, 15). Alternatively, porcine epidemic diarrhea 
coronavirus (PEDV), which replicates in the epithelial cells of the small intestine, 
undergoes §S proteolytic activation by trypsin, which is highly abundant in the lumen of 
this organ(16). These examples illustrate how the availability of host proteases and the 
mechanism of proteolytic activation can directly restrict coronavirus activation, viral 


tropism, and pathogenesis. 
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One common pattern shared by both SARS and MERS outbreaks is that although they 
both originated in bats, an intermediate host with closer physical proximity to humans 
allowed for more efficient cross-species transmission. Palm civets and camels were the 
most probable intermediate hosts for SARS-CoV and MERS-CoV, respectively(7, 17, 
18). Due to their proximity with humans, pigs also acted as intermediate hosts for the 
influenza pandemic (19) and for the emergence of Nipah virus in Malaysia(20). To date, 
only a- and B-coronaviruses have been implicated in human diseases and several S 
glycoproteins from viruses belonging to these two genera have been structurally 
characterized(21-26). To the best of our knowledge, no porcine coronaviruses have 
crossed the species barrier to infect humans, and their receptor usage appears to favor 
porcine orthologues. Porcine epidemic diarrhea virus (PEDV), however, can infect pig, 
human, monkey and bat cells, suggesting it has the potential to spillover to species 
other than pig(27). As a result, cross-species transmission of coronaviruses poses an 
imminent and long-term threat to human health which emphasizes the need for 


surveying and studying these viruses to prevent and control infections. 


The recently emerged porcine deltacoronavirus (PDCoV) is responsible for diarrhea in 
piglets and has had devastating consequences for the swine industry worldwide(28, 29). 
No vaccines or treatments are available for PDCoV. Here, we report the cryoEM 
structure of the PDCoV S trimer revealing that it has a molecular architecture most 
closely related to the S glycoproteins of the a-genus of coronaviruses. Integrating 
structural and glycoproteomics data, we discovered that PDCoV S masks potential 


epitopes with glycans in a way reminiscent of the human respiratory a-coronavirus 
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HCoV-NL63 S glycoprotein(22). These results support a relatedness between a- and 6- 
coronavirus S glycoproteins and suggest that the immune system of infected hosts exert 
comparable selection pressure on these viruses which has led to these adaptations. 
The structure also reveals the C-terminal Sz fusion machinery of the PDCoV S protein 
features a short S»’ activation loop which appears to be largely resistant to proteolysis 
by trypsin/chymotrypsin. We conclude that PDCoV has evolved to be highly adapted to 
the protease-rich environment of the enteric tract to ensure proper spatial and temporal 
activation of fusion and prevent premature triggering which would significantly impact 


virus infectivity. 


Results 


Structure determination of the PDCoV S glycoprotein 

PDCoV was first identified in Hong Kong in 2012(29) and it has since spread rapidly in 
the swine population across the globe(28, 29). Due to its recent emergence, relatively 
little is known about this virus compared to other swine coronaviruses. One feature that 
distinguishes PDCoV from other known coronaviruses is that it encodes one of the 
smallest S glycoproteins. We therefore set out to explore the architectural diversity of S 
proteins across coronavirus genera to understand shared and unique features of the 


structurally uncharacterized 5-genus. 


We used Drosophila S2 cells to produce the PDCoV/USAT/IIlinois121/2014 S$ 
ectodomain (residues 1-1098) with a C-terminal fusion adding a GCN4 trimerization 


motif and a strep-tag(30). Following sample vitrification by triple blotting(31), data were 
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acquired on an FEI Titan Krios electron microscope equipped with a Gatan Quantum 
GIF energy filter operated in zero-loss mode and a Gatan K2 Summit electron-counting 
camera operated in super-resolution mode (Fig 1A-B). We determined a 3D 
reconstruction at 3.5 A resolution resolving most amino acid side chains, disulphide 
bonds and N-linked glycans (Fig S1A). These features were used as fiducials to confirm 
the sequence register during model building (Fig 1C-F and S1B-E Fig). Starting from the 
HCoV-NL63 S structure(22), we obtained an atomic model of the PDCoV S trimer using 
manual modeling in Coot(32) and Rosetta density-guided iterative refinement(33). The 


final model comprises residue 52 to 1021 and 21 N-linked glycans (Table 1). 


The PDCoV S protein assembles as a compact trimer with a height of ~145 A anda 
width of 115 A (Fig 1C-D). The S; subunit has a modular organization comprising four 
distinct domains, designated A, B, C and D, whereas the S» subunit adopts a mostly- 
helical elongated architecture with a connector domain appended to its C-terminal 


end(21, 22) (Fig 1E-F). 


The extensive PDCoV S glycan shield 

The unsharpened PDCoV S map resolves 21 N-linked glycans for each protomer that 
form prominent protrusions extending from the protein surface (Fig 2A-B and Fig S1 F- 
G). Using on-line reversed phased liquid chromatography with electron transfer/high- 
energy collision-dissociation tandem mass-spectrometry(34), we detected 16 N-linked 
glycosylation sites corresponding to those observed in the cryoEM map and confirmed 5 


additional sites located in the structurally unresolved N and C-terminal parts of the 


ATLSVOMAN JO AINN AQ 2102 ‘€ JOqUIEAON UO /Hio"wse"IAl//:dyY Wo. papeojumog 


(0) 
= 
6 
me) 
(0b) 
a 
Wn 
O 
\. 
he 
ox 
— 
5) 
ny 
=) 
‘= 
= 
me) 
(0) 
oj 
(OL. 
(0) 
U 
©) 
6 


Journal of Virology 


Journal of Virology 


207 


208 


209 


210 


211 


212 


213 


214 


215 


216 


217 


218 


219 


220 


221 


222 


223 


224 


225 


226 


227 


228 


229 


protein (Fig 2C and Table S1). Combining our structural and mass-spectrometry data, 
we found evidence for glycosylation at 26 out of 27 possible NXS/T glycosylation 
sequons. The intact glycopeptides detected by MS/MS for PDCoV S expressed in 
Drosophila S2 cells corresponded mostly to paucimannosidic glycans containing 3 
mannose residues (with or without core fucosylation) and oligomannose glycans 
containing 4 to 9 mannose residues. We also detected complex glycans (with or without 
core fucosylation), which appears compatible with the accessibility and crowding of 


these carbohydrate chains that would permit processing(35, 36). 


Overall, the glycan coverage of PDCoV S is dense and extensively decorates the 
accessible surface of the trimer. Although we detected substantially more N-linked 
glycans for HCoV-NL63 S(22) (34 sites per protomer), 6 validated glycans reside within 
the N-terminal domain 0, which is absent in PDCoV S and explains most of the 
discrepancy in the number of sites. Strikingly, numerous glycans identified in the 
PDCoV S structure overlap with glycans in the HCoV-NL63 S protein, either strictly or 
topologically, with most differences towards the viral membrane distal end of the 
molecule (Fig 2D-E). Transmission of zoonotic viruses into humans can result in drastic 
changes in glycosylation, as exemplified by the human influenza H3 hemagglutinin that 
has doubled its number of glycosylation sites since the 1968 pandemic although its 
amino acid sequence remains ~88% identical(37). There is considerable sequence 
divergence between the HCoV-NL63 and PDCoV S glycoproteins, which share 43% 
amino acid sequence identity. The observation that numerous glycosylation sites are 


conserved between the two proteins suggest that a- and 5-coronaviruses could face 
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similar immune pressure in their respective hosts, and that the areas that are masked 
by the conserved glycans are key to the function of these S glycoproteins. Based on the 
information gained from the HCoV-NL63 §S structure(22), for which glycans appears to 
contribute to masking the receptor-binding loops from antibody recognition, we suggest 
that the glycan shield of PDCoV S and other coronavirus S glycoproteins could assist in 


immune evasion similarly to the well-characterized HIV-1 envelope trimer(35, 36). 


Finally, coronavirus S glycans have previously been proposed to participate in host cell 
entry(38), since L-SIGN lectin can be used as an alternative receptor by SARS-CoV(39) 


and HCoV-229E(40), and it is conceivable they play a similar role for other S proteins. 


Architecture of the S; receptor-binding subunit 

The PDCoV and HCoV-NL63 S; subunits exhibit strikingly similar structures 
(r.m.s.d.=2.7 A over 448 aligned Ca positions), except for the absence of the N-terminal 
domain 0 in the former glycoprotein (Fig 3A). Deletion of domain 0, which is responsible 
for attachment to sialoglycans, in the porcine transmissible gastro-enteritis virus (TGEV) 
S gene, gave rise to porcine respiratory coronavirus (PRCV) and in turn resulted in a 
loss of enteric tropism(41, 42). PDCoV and HCoV-NL63, however, exhibit opposite 
behavior, as they target the enteric or the respiratory tracts despite the absence or 
presence of a domain 0 in their S glycoproteins, respectively. We describe below the 
functionally-relevant similarities and differences detected in the PDCoV §S structure 


relative to other coronavirus S structures. 
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Domain A is located at the viral membrane distal side and account for a large part of the 
exposed surface area of the S; subunit. It folds as a galectin-like B-sandwich 
supplemented with a helix on the viral membrane distal side and a three-stranded 
antiparallel B-sheet plus a helix on the proximal side. The domain A surface is heavily 
glycosylated and features 7 glycans for PDCoV (Fig 3D). We previously reported that 
the HCoV-NL63 S glycan linked to Asn358 (domain A) points towards the receptor- 
binding domain B, masking residues involved in receptor recognition. A marked 
difference between the A domains of PDCoV S and HCoV-NL63 S is that the B-hairpin 
harboring Asn358 in HCoV-NL63 S features a deletion of 10 residues significantly 
shortening it in PDCoV S (Fig 3A-C). Moreover, the topologically equivalent glycan 
linked to residue Asn-184 of PDCoV S is protruding away from domain B and does not 


significantly cover it, in contrast to what was observed for HCoV-NL63 (Fig 3A-C). 


OC43, HKU1 and bovine coronavirus (BCoV) are known to use 9-O-acetyl-sialylated 
cellular receptors for attachment to host cells(43, 44). Structural and biochemical 
studies showed that domain A mediates these interactions and mapped key residues 
involved(25, 45) and nanoparticle-displayed multimeric OC43 S; subunit exhibited high 
hemagglutination titer (Fig. 3 F). Comparison of the PDCoV, HKU1 and BCoV domain A 
structures indicated PDCoV cannot interact with 9-O-acetyl-sialoglycans in a similar way 
due to the absence of the strictly conserved residues involved in binding (BCoV Tyr162, 
Glu182, Trp184 and His185) and of the loops delineating the binding cavity (Fig. 3 D-E). 
In line with this observation, isolated or nanoparticle-displayed multimeric PDCoV S; 


subunit failed to interact with sialic acid using an erythrocyte hemagglutination assay 
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(Fig. 3 F), indicating that sialic acid (or at least the types of sialoglycans displayed on 


these erythrocytes) does not participate in PDCoV S attachment to host cells. 


Domain B folds as a f-sandwich reminiscent of the equivalent domain of a- 
coronaviruses such as PRCV (r.m.s.d.= 2.1 A over 108 aligned Ca positions), HCoV- 
NL63 (r.m.s.d.= 1.9 A over 107 aligned Ca. positions) and TGEV (r.m.s.d.= 3.0 A over 
109 aligned Ca positions) (Fig 4A-D)(22, 46, 47). The two PDCoV S glycosylation sites 
identified at Asn-311 and Asn-331 in domain B are topologically or strictly conserved 
with the HCoV-NL63 S glycans linked to Asn-486 and Asn-512, respectively. PRCV and 
TGEV B domains also feature topologically similar glycosylation sites on the solvent 
exposed surface of the B-sandwich and these glycans are likely to limit the immune 
response against this domain which is known to be the target of neutralizing antibodies 
for several coronaviruses(47-52). The glycan linked to Asn-506 in HCoV-NL63 S is 
absent in PDCoV §S for which the equivalent residue is Ser-325 (Fig 4A and C). Since 
masking of receptor-binding residues has been suggested to assist HCoV-NL63 
immune evasion(22), the reduced overall glycan coverage of PDCoV domain B could 
result from weaker immune pressure directed at the receptor-binding region in pigs 


compared to HCoV-NL63 S in humans. 


Previous work showed that the loops located at the viral membrane distal end of the p- 
sandwich of domain B in a-coronavirus S glycoproteins are responsible for binding to 
diverse host receptors such as ACE2 (HCoV-NL63)(46) or pAPN (PRCV/TGEV)(47). 


Although the distal loops are significantly shorter for PDCoV compared to these two a- 
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coronaviruses, loop 1 and loop 3 contain several aromatic residues (Fig 4A-D). Since 
aromatic residues in these loops have been shown to directly participate in receptor- 
binding for HCoV-NL63, PRCV and TGEV, we speculate that they could also mediate 
interactions of the PDCoV B domain with its receptor. As is the case for HCoV-NL63 S, 
the PDCoV B domain has an opposite orientation, related by a ~180° rotation, to the 
equivalent domain of B-coronavirus S glycoproteins(21). This results in burying the 
distal loops of the B-sandwich through interactions with domain A belonging to the same 
protomer and in turn restrain the availability of the putative receptor-binding motif to 
interact with the receptor (Fig 3B-C). As a result, it is likely that PDCoV and HCoV-NL63 
S glycoproteins can undergo similar conformational changes to those described for 
domain B of SARS-CoV and MERS-CoV S to interact with their cognate receptors(23, 
24, 26). A major difference, however, is that B-coronavirus S using domain B as 
receptor-binding domain appear to spontaneously undergo these rearrangements 


whereas a- and 6-coronavirus S do not and rely on a yet unidentified stimulus(22). 


Organization of the S2 fusion machinery 

The C-terminal S2 subunit trimer fuses the viral and cellular membranes at the onset of 
infection and is the most conserved region among coronavirus S glycoproteins. The 
PDCoV Soe subunit is structurally similar to a- and B-coronavirus Sz subunits such as 
HCoV-NL63(22) (r.m.s.d.= 1.7 A over 413 aligned Ca positions) and MHV(21) (r.m.s.d.= 
2.2 A over 291 aligned Ca positions), respectively. The conserved architecture of the Sz 
fusion machinery across multiple genera highlights that coronaviruses rely on a 


common fusion mechanism to enter host cells(53). Despite these striking similarities, 
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coronavirus fusion machineries exhibit differences with key functional implications for 


their activation mechanism and potential for zoonotic spillover. 


The Sz‘ activation loop, which connects the upstream helix to the fusion peptide and 
regulates the spatial and temporal activation of fusion, is resolved in the PDCoV S 
cryoEM map (Fig 5A-B), as was the case for the HCoV-NL63 S(22) and SARS-CoV S 
structures(23, 24). However, the PDCoV S»’ loop is short (LTTRIGGR) and comprises 6 
and 3 fewer residues than the HCoV-NL63 S (LPQRNIRSSRIAGR) and SARS-CoV S 
(ILPDPLKPTKR) counterparts, respectively. In addition, the S2’ loop of these viruses 
contains multiple positive charges, including two putative furin cleavage sites for HCoV- 
NL63 S, whereas the PDCoV Sz’ loop harbors a single positively charged residue (Arg- 
669) in addition to the conserved Arg-673 residue (Fig 5A-B). These structural features 
allow rationalizing the known protease requirements for fusion activation of the HCoV- 
NL63 S glycoprotein, which is preferentially cleaved by furin in the endosomes, and of 
the SARS-CoV §S glycoprotein, which is preferentially processed by cathepsin L in the 
endo-lysosomes, and explain the fact that trypsin-like TMPRSS proteases can also 
trigger both proteins(10, 15). The paucity of positive charges in the PDCoV Sz.’ trigger 
loop is in line with the requirement for trypsin or other pancreatic proteases to allow 
virus passaging and the fact that PDCoV is exposed to high concentration of such 


proteases in the enteric tract of infected pigs(28). 


Studies on influenza hemagglutinin highlighted that glycans can modulate cleavage site 


accessibility to proteases and in turn influence fusion activation(54, 55). Similar 
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observations were drawn from comparisons between the MERS-CoV and HKU4 S 
glycoproteins(14). Notably, the PDCoV S glycans linked to Asn-652 and Asn-661 
decorate the trimer surface near the S»’ trigger loop and could limit accessibility to 
proteases and play a role in orchestrating fusion activation (Fig 5B). These glycans are 
conserved with an identical structural organization in HCoV-NL63 S and may have the 


same putative function(22). 


Sequence alignment of representative S glycoproteins from viruses of the four 
coronavirus genera show that a- and 6-coronaviruses feature a 14-residue long 
insertion in the heptad-repeat 1 (HR1) and in the HR2 motifs, corresponding to two 
heptad-repeats, compared to P-coronaviruses (S2A Fig). y-coronaviruses form a 
heterogeneous group comprising S glycoproteins without insertion but also S with one 
(BeCoV-SW1 and BdCoV-HKU22) or two (TurkeyCoV-MG10) additional heptad-repeats 
in HR1 and in HR2 compared to B-coronaviruses. The HR1 insertion is resolved in the 
PDCoV S cryoEM map (Fig 5C, residues 797-811) and corresponds to the addition of 
two helical turns (also visible in the HCoV-NL63 S structure) preceded by a loop (poorly 
resolved in the HCoV-NL63 S structure). This polypeptide segment is known to refold to 
form a central triple helical coiled-coil in the post-fusion S structure(53). The HR2 
insertion cannot be visualized as this region is disordered in the PDCoV S 
reconstruction and in all other coronavirus pre-fusion S structures. Mapping the HR1 
and HR2 insertions on the HCoV-NL63 S post-fusion core X-ray structure(56), however, 
reveals that these polypeptide segments are directly interacting within the 6-helix bundle 


(S2B Fig). This suggests that the strict correlation of their presence or absence in both 
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HR1 and HR2, along with the observation that insertions are always corresponding to 
an integer number of heptad repeats, is necessary to maintain the proper geometry of 
the fusion machinery and allow the conserved conformational changes driving 


membrane merger to take place with high efficiency. 


Discussion 

Structural and functional studies of coronavirus S glycoproteins are key to 
understanding host and tissue tropism as well as the mechanisms of receptor binding 
and fusion activation. The data reported in this manuscript establishes a strong 
connection between a- and 8-coronavirus S glycoproteins. HCoV-NL63, PRCV, TGEV 
and PDCoV B domains fold as similar B-sandwiches that are structurally distinct from 
the single B-sheet observed for the equivalent domain of B-coronaviruses(4, 5, 21, 25, 
47). Moreover, the structures of HCoV-NL63 S and PDCoV S show that both 
glycoproteins share a common organization of their S; subunits in which the B domain 
directly interact with domain A from the same subunit to potentially limit accessibility of 
the receptor-binding loops to neutralizing antibodies. Sequence analysis indicates a 
strict correlation of the presence or absence of the HR1/HR2 insertions in the S 
glycoprotein sequence and an apparent evolutionary pressure _ restricting 
insertions/deletions to heptad-repeat units which we postulate to be necessary for 
efficient S refolding and fusion. Based on these criteria, we put forward that a- and 6- 
coronavirus S glycoproteins share closer evolutionary relationships with each other than 
they do with S of the other two coronavirus genera although insertions in HR1 and HR2 


have also been detected in a subset of y-coronavirus S proteins. 
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We previously recapitulated in vitro the proteolytic activation of MHV, SARS-CoV and 
MERS-CoV pre-fusion S trimers, via trypsin incubation under limited proteolysis 
conditions, which led to spontaneous refolding into post-fusion Sz trimers (the ground 
state of the fusion reaction)(53). In contrast, the PDCoV S glycoprotein remained largely 
uncleaved even after extended incubation times with up to 5:1 molar ratio of S to trypsin 
or chymotrypsin (0.1 mg/ml) (Figure 6). These results suggest that fusion activation of 
PDCoV S, which is believed to be promoted by trypsin(28), involves an additional step 
to expose the S»’ cleavage site, such as the receptor-binding induced conformational 
changes described for MERS-CoV(13), SARS-CoV(57), MHV(58) and PEDV(16). 
These distinct protease sensitivities are reminiscent of the differences reported between 
clinical isolates (CV777) and cell-culture adapted (caDR13) PEDV strains for which 
infectivity strictly requires or is hampered by trypsin, respectively(16). We put forward 
that the S glycoprotein sequence and in turn structure of PDCoV and PEDV CV777 
have evolved to be resistant to pancreatic proteases to which both viruses are exposed 
in the enteric tract during infection. Fine-tuning of fusion activation is likely achieved by 
restraining access to the S»’ cleavage site until receptor-binding occurs at the host cell 
surface. This event could promote conformational changes exposing the Sz’ site to allow 
processing by trypsin or other proteases with exquisite spatial and temporal 
coordination. In contrast, SARS-CoV, MERS-CoV or MHV are not expected to be 
exposed to pancreatic proteases during the virus life-cycle and their S glycoproteins 
presumably did not evolve with this selection pressure, explaining their sensitivity to 


trypsin and chymotrypsin. In agreement with what has been postulated for SARS- 
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CoV(57) and PEDV caDR13(16), trypsin sensitivity could result in premature 
cleavage/triggering of the pre-fusion S trimer and attenuation of infectivity and viral 


fitness. 


While completing this study, another group also determined a cryoEM reconstruction of 
the PDCoV S glycoprotein ectodomain and both structures can be superposed with 


excellent agreement (r.m.s.d. of 1.1 A over 959 aligned Ca positions). 
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A gene fragment encoding the PDCoV S ectodomain (residues 20-1098, Uniprot: 
W8Q9Y7) was PCR-amplified from a plasmid containing the full-length S gene. The 
PCR product was ligated to a gene fragment encoding a GCN4 trimerization motif 
(LIKRMKQIEDKIEEIESKQKKIENEIARIKKIK)(21, 59, 60), a thrombin cleavage site 
(LVPRGSLE), an 8-residue long Strep-Tag (WSHPQFEK) and a stop codon, as 
previously described(61). Subsequent cloning was performed in the pMT\BiP\V5\His 
expression vector (Invitrogen) in frame with the Drosophila BiP secretion signal 
downstream the metallothionein promoter. 

A human codon-optimized gene encoding for the ectodomain (residues 14-1180) of the 
SARS-CoV S protein (UniProt: P59594) was cloned into a modified pOPING vector(62) 
(Addgene) introducing a N-terminal Mu-phosphatase signal peptide and a C-terminal 
TEV protease cleavage site, a foldon and a hexa-histidine tag at the C-terminus of the 


construct. 


Production of recombinant PDCoV S ectodomain in Drosophila S2 cells 

To generate a stable Drosophila S2 cell line expressing the recombinant PDCoV S 
ectodomain, we used Effectene (Qiagen) and 2 ug of plasmid. Puromycin N-acety| 
transferase was co-transfected as dominant selectable marker. Stable PDCoV S 
expressing cell lines were selected by addition of 7 ug/ml Puromycin (Invitrogen) to the 
culture medium 48 h after transfection. For large-scale production, the cells were 
cultured in spinner flasks and induced by 5 uM of CdClz ata density of approximately 
10’ cells per mL. After a week at 28 °C clarified cell supernatants were concentrated 40- 


fold using Vivaflow tangential filtration cassettes (Sartorius, 10 kDa cutoff) and adjusted 
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to pH 8.0, before affinity purification using StrepTactin Superflow column (IBA) followed 
by gel filtration chromatography using Superose 6 10/300 GL column (GE Life 
Sciences) equilibrated in 20 mM Tris-HCI pH 7.5 and 100 mM NaCl. The concentration 


of the purified protein was estimated using absorption at 280 nm. 


Production of recombinant SARS S ectodomain in HEK293F cells 

Transient transfection of 250mL HEK293F cells at a density of 10° cells/mL was 
performed using 293fectin (ThermoFisher) and Optimem (ThermoFisher). After 3 days 
the cells were harvested before affinity purification with a Talon 5mL cobalt column 
equilibrated in 25mM sodium phosphate pH 8.0, 300mM NaCl, 10mM Imidazole. The 
purified protein was buffer exchanged into 20mM Tris pH 8.0, 150mM NaCl and 


concentrated to 1.0mg/mL. 


CryoEM specimen preparation and data collection 

Two microliters of purified PDCoV S at ~ 0.5 mg/mL was triple-blotted(31) using 1.2/1.3 
C-flat grids (Protochips), which had been glow discharged for 30 seconds at 20mA. 
Grids were then plunge-frozen in liquid ethane using an FEI Mark | Vitrobot with 7.5 
seconds blot time and an offset of -3mm at 100% humidity and 25°C. Data was 
collected using SerialEM automatic data collection software(63) on a FEI Titan Krios 
operated at 300kV and equipped with a Gatan Quantum GIF energy-filter operated in 
zero-loss mode with a slit width of 20 eV and a Gatan K2 Summit direct electron 
detector camera operated in super-resolution mode. The dose rate was adjusted to ~5 


counts/pixel/s and each movie was acquired in counting mode fractionated in 75 frames 
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of 200 ms. 2,000 micrographs were collected in a single session using a defocus range 


comprised between 1.5 and 4.0 um. 


CryoEM data processing 

Frame alignment was carried out using Motioncor2(64). The parameters of the 
microscope contrast transfer function were initially estimated using GCTF(65) and then 
using CTFFIND4(66). Particles were automatically picked using DoGPicker(67). Particle 
images were extracted and processed using Relion 2.0(68) with a box size of 640 
pixels’ and a pixel size of 0.665 A. Following reference-free 2D classification, we ran 3D 
classification with C1 symmetry(69) using an_ initial model generated with 
e2initialmodel.py in EMAN2. 455,710 particles were selected to run a gold-standard 3D 
refinement imposing C3 symmetry using Relion 2.1 (70) that led to a map at 3.5 A 
resolution. Post processing was done using Relion to apply an automatically generated 
B factor of -150 A*. Reported resolutions are based on the gold-standard FSC=0.143 
criterion(70, 71) and Fourier shell correlation curves were corrected for the effects of 
soft masking by high-resolution noise substitution(72). The soft mask used for FSC 


calculation had a 10 pixel cosine edge fall-off. 


Model building and analysis 

UCSF Chimera(73) was used to fit the HCoV-NL63 S structure(22) into the cryoEM map 
before manual rebuilding in Coot(32, 74). Glycan density coming after an NXS/T motif 
was initially hand built into the density where visible and glycan geometry was then 


refined using Rosetta, optimizing the fit-to-density as well as the energetics of 
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protein/glycan contacts. The final model was refined using the symmetric modeling 
framework in Rosetta(33, 75). The quality of the final model was analyzed with 


Molprobity(76) and Privateer(77). All figures were generated with UCSF Chimera(73). 


Mass Spectrometry 

250 pmol of PDCoV S was incubated in a freshly prepared solution containing 100mM 
Tris pH 8.5, 2% sodium deoxycholate, 10mM tris(2-carboxyethyl)phosphine, and 40mM 
iodoacetamide at 95 °C for five minutes followed by 25 °C for thirty minutes in the dark. 
80 pmol of denatured, reduced, and alkylated PDCoV S was then diluted into freshly 
made 50mM ammonium bicarbonate and incubated for 14 hours at 37 °C either with 
1:75 (w:w) of trypsin (Sigma Aldrich), or chymotrypsin (Sigma Aldrich) or alpha lytic 
protease (Sigma Aldrich). Formic acid was then added to a final concentration of 2% to 
precipitate the sodium deoxycholate in the samples, followed by centrifugation at 14,000 
rpm for 20 minutes. The supernatant containing the (glyco)-peptides was collected and 
spun again at 14,000 rpm for 5 min immediately before sample analysis. For each 
sample 8 uL was injected on a Thermo Scientific Orbitrap Fusion Tribrid mass 
spectrometer. A 35-cm analytical column and a 3-cm trap column filled with ReproSil- 
Pur C18AQ 5 um (Dr. Maisch) beads were used. Nanospray LC-MS/MS was used to 
separate peptides over a 110-min gradient from 5% to 30% acetonitrile with 0.1% formic 
acid. A positive spray voltage of 2,100 was used with an ion-transfer-tube temperature 
of 350 °C. An electron-transfer/higher-energy collision dissociation ion-fragmentation 
scheme (34) was used with calibrated charge-dependent ETD parameters and 


supplemental higher-energy collision dissociation energy of 0.15. A resolution setting of 
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120,000 with an AGC target of 2 x 105 was used for MS1, and a resolution setting of 
30,000 with an AGC target of 1 x 105 was used for MS2. Data was searched with 
Protein Metrics Byonic software (78), using a small custom database of recombinant 
protein sequences including several coronavirus spike proteins, other viral glycoproteins 
and the proteases used to prepare the glycopeptides. Reverse decoy sequences were 
also included in the search. Specificity of the search was set to C-terminal cleavage at 
R/K (trypsin), FAW/Y/M/L (chymotrypsin) or T/A/S/V (alpha lytic protease) allowing up to 
two missed cleavages, with EThcD fragmentation (b/y- and c/z-type ions). We used a 
precursor mass and product mass tolerance of 12 ppm and 24 ppm respectively. 
Carbamidomethylation of cyteines was set as fixed modification, methionine oxidation 
as variable modification, and all four software-provided N-linked glycan databases were 
combined into a single non-redundant list used to identify glycopeptides. All 
glycopeptide hits were manually inspected and only those with quality peptide sequence 


information are reported here. 


Proteolysis of PDCoV S and SARS S glycoproteins 
Proteins at a concentration of 0.5 mg/mL (PDCoV S) or 1mg/mL (SARS-CoV S) were 
incubated with 0.1 mg/mL of either trypsin (SigmaAldrich) or chymotrypsin at 22 °C for 


two hours. This reaction was then used for analysis by SDS-PAGE. 


Accession number(s) 


The mass-spectrometry data have been deposited to PRIDE with accession 


code PXD007107 and includes the raw data, Byonic search results and the databases 
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used for protein sequences and N-linked glycan modifications. The EM map and PDB 


model have been deposited with accession codes EMD-7094 and 6BFU. 
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792 Figure Captions 

793 Fig 1. CryoEM structure of the PDCoV S protein. A, A representative micrograph of 
794 vitreous ice-embedded PDCoV § protein at 3.4 um defocus. Scale bar: 510 A. B, 
795 Selected 2D class averages of the PDCoV S protein. Scale bar: 85 A. C-D, Side (C) and 
796 top (D) views of the PDCoV S cryoEM map filtered at 3.5 A resolution and sharpened 
797 with a B-factor of -150 A*. The density is colored per protomer. E-F, Ribbon 
798 representation of the PDCovV S trimer structure rendered with the same orientations as 
799 in panels C-D. One protomer is colored according to the indicated structural domains 
800 whereas the other two protomers are colored gray. 


801 


30 


=> 

fo) 
me} 
i) 
= 
> 
sg 
fe) 
o 
= 
(= 
=] 
fe) 
me) 


ATLSVOMAN JO AINN Aq 2102 ‘€ JOqUIEAON UO /Bio"wse"IAl//:dyY Woy papeojumoq 


(0) 
= 
6 
7O 
(0) 
—_ 
Ca) 
O 
jae 
—_ 
ok 
= 
O 
2) 
=) 
(= 
= 
7O 
0) 
— 
(ok 
(0) 
O 
O 
~ 


Journal of Virology 


Journal of Virology 


802 


803 


804 


805 


806 


807 


808 


809 


810 


811 


812 


813 


814 


815 


816 


817 


818 


819 


820 


821 


822 


823 


824 


Fig 2. Glycosylation profile of the PDCoV S protein. A-B, Two orthogonal views of 
the PDCoV S trimer rendered as ribbons. Glycan density extracted from the 
unsharpened reconstruction is colored green for one protomer and grey for the other 
two protomers. Labels indicate the position of N-linked glycosylated asparagine 
residues. C, Schematic summary of all detected N-linked glycans. Each site shows the 
most extensive glycan structure detected, either by mass-spectrometry or cryoEM. A full 
overview of all detected N-linked glycans is provided in Supplementary Table 1. Glycan 
moities are represented as symbols according to the key and the structural domains are 
individually colored and indicated in a linear representation of the PDCoV S sequence. 
D-E, Ribbon representation of PDCoV (D) and HCoV-NL63 (E) S protomers with 


glycans visualized by cryoEM shown as green spheres. 


Fig 3. Structural features of the PDCoV S; subunit and the galectin-like domain A. 
A, Superposition of the PDCoV and HCoV-NL63 S; subunits highlights the absence of 
domain 0 in PDCoV S. B, View of the interface between PDCoV S A and B domains 
showing the Asn-184 glycan points away from domain B. C, View of the interface 
between HCoV-NL63 S A and B domains showing the Asn-358 glycan contributes to 
masking the receptor-binding loops. D, Ribbon representation of PDCoV domain A. E, 
Ribbon representation of BCoV domain A oriented identically to panel (D). Highly 
conserved residues involved in sialic acid recognition are shown in ball and stick 


representation. Glycans are rendered as spheres in panels A-C or sticks in panels D-E 
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and colored per atom type (carbon: green, nitrogen: blue and oxygen: red). F, The 
PDCoV S; subunit C-terminally tagged with the Fc portion of human IgG (S1-Fc) was 
tested for its hemagglutination potential of an erythrocyte suspension of human or rat 
origin, either alone or premixed with protein A-coupled nanoparticles to increase the 
avidity of S1-Fe proteins for sialic acids. The sialic-acid binding S; subunit of HCoV- 
OC43 (GenBank: AARO1015.1) C-terminally fused to human Fc portion was used as a 
positive control. ‘Mock’ indicates the condition where no S; subunit was used (negative 


control). Wells positive for hemagglutination are encircled. 


Fig 4. Structural comparison of a- and 5-coronavirus receptor-binding domains. 
A-D, Ribbon rendering of the receptor-binding domain (domain B) of the d-genus 
PDCoV S (A) and a-genus PRCV S (B), HCoV-NL63 S (C) and TGEV S (D). Loops that 
have been implicated in receptor-binding for a-coronaviruses are indicated. Key 
aromatic residues that have been shown to take part in a-coronavirus receptor-binding 
and putatively involved in 6-coronavirus receptor-binding are highlighted. Disulphide 
bonds that stabilise receptor binding loops are indicated and glycans within the domain 


are shown as sticks (carbon: green, nitrogen: blue and oxygen: red). 


Fig 5. Structural features of the PDCoV S» subunit. A, Ribbon representation of the 
PDCoV S trimer with the Sp subunit core of one protomer colored from blue to red (from 


N-terminus to C-terminus). B, Zoomed-in view of the S»’ activation loop region. Two 
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glycans, linked to Asn-669 and Asn-673, that are strictly conserved in HCoV-NL63 S are 
shown as sticks (carbon: green, nitrogen: blue and oxygen: red). For comparison, the 
equivalent residues in the HCoV-NL63 S protein are indicated in gray. C, The PDCoV S 
glycoprotein features an insertion of 14 amino acid residues in HR1, compared to the p- 
coronavirus MHV §S protein, folding as an extended loop and an helical extension of two 
turns. The residues accounting for this HR1 insertion interact with the complementary 


insertion in HR2 in the post-fusion conformation (Fig S2 B). 


Fig 6. The PDCoV S glycoprotein is resistant to digestive enzymes. Purified SARS 
S (1 mg/ml) and PDCoV S (0.5 mg/ml) glycoproteins were incubated with 0.1 mg/ml 
trypsin or chymotrypsin for 2 hours at 22 °C. The digestion reactions were analyzed on 
a 12% SDS-PAGE gel. After incubation, the SARS S protein was extensively 


proteolyzed whereas a large fraction of the PDCoV S protein remains intact. 
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Table 1 Data collection and refinement statistics 


Data collection 

Number of particles 

Pixel size (A) 

Voltage (kV) 

Electron dose (e-/A’) 
Refinement 

Resolution (A) 
Map-sharpening B factor (A’) 
Model validation 

Favored rotamers (%) 

Poor rotamers (%) 
Ramachandran allowed (%) 
Ramachandran outliers (%) 
Clash score 

MolProbity score 


455710 
1.33 
300 


23.5 


3.5 
-150 


98 
0.35 
99.69 
0.31 
2.2 
1.27 
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